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Abstract. Geometry and Diophantine equations have been ever-present 
in mathematics. Diophantus of Alexandria was born in the 3rd century 
(as far as we know), but a systematic mathematical study of word equa¬ 
tions began only in the 20th century. So, the title of the present article 
does not seem to be justified at all. However, a linear Diophantine equa¬ 
tion can be viewed as a special case of a system of word equations over a 
unary alphabet, and, more importantly, a word equation can be viewed 
as a special case of a Diophantine equation. Hence, the problem Word- 
Equations; “Is a given word equation solvable?”, is intimately related to 
Hilbert’s 10th problem on the solvability of Diophantine equations. This 
became clear to the Russian school of mathematics at the latest in the 
mid 1960s, after which a systematic study of that relation began. 

Here, we review some recent developments which led to an amazingly 
simple decision procedure for WordEquations, and to the description of 
the set of all solutions as an EDTOL language. 


Word Equations 

A word equation is easy to describe: it is a pair {U, V) where U and V are 
strings over finite sets of constants A and variables 17. A solution is mapping 
a : fi ^ A* which is extended to homomorphism cr : (A U 17)* —>■ A* such 
that (t{U) = (t{V). Word equations are studied in other algebraic structures and 
frequently one is not interested only in satisfiability. For example, one may be 
interested in all solutions, or only in solutions satisfying additional criteria like 
rational constraints for free groups [B]. Here, we focus on the simplest case of 
word equations over free monoids; and by WordEquations we understand the 
formal language of all word equations (over a given finite alphabet A) which are 
satisfiable, that is, for which there exists a solution. 

History 

The problem WordEquations is closely related to the theory of Diophantine equa¬ 
tions. The publication of Hilbert’s 1900 address to the International Congress of 
Mathematicians listed 23 problems. The tenth problem (Hilbert 10) is: 

“Given a Diophantine equation with any number of unknown quantities 
and with rational integral numerical coefficients: To devise a process 
according to which it can be determined in a finite number of operations 
whether the equation is solvable in rational integers.” 


There is a natural encoding of a word equation as a Diophantine problem. It 


is based on the fact that two 2 x 2 integer matrices (J ? ) and (J }) generate a free 


monoid. Moreover, these matrices generate exactly those matrices on SL(2,Z) 
where all coefficients are natural numbers. This is actually easy to show, and 
also used in fast “fingerprint” pattern matching algorithm by Karp and Rabin 
m- A reduction from WordEquations to Hilbert 10 is now straightforward. 
For example, the equation abX = Yba is solvable if and only if the following 
Diophantine system in unknowns Xi , ..., I 4 is solvable over integers: 




X, > 0 & 1 ; > 0 for 1 < i < 4 


The reduction of a Diophantine system to a single Diophantine equation is clas¬ 
sic. It is based on the fact that every natural number can be written as a sum of 
four squares. In the mid 1960s the following mathematical project was launched: 
show that Hilbert 10 is undecidable by showing that WordEquations is unde- 
cidable. The hope was to encode the computations of a Turing machine into 
a word equation. The project failed greatly, producing two great mathematical 
achievements. In 1970 Matiyasevich showed that Hilbert 10 is undecidable, based 
on number theory and previous work by Davis, Putnam, and Robinson, see the 
textbook [H]. A few years later, in 1977 Makanin showed that WordEquations 
is decidable m- 

In the 1980s, Makanin showed that the existential and positive theories of free 
groups are decidable m- In 1987 Razborov gave a description of all solutions for 
an equation in a free group via “Makanin-Razborov” diagrams Eiiiig. Finally, 
in a series of papers ending in [13j Kharlampovich and Myasnikov proved Tarski’s 
conjectures dating back to the 1940s: 

1. The elementary theory of free groups is decidable. 

2. Free non-abelian groups are elementary equivalent. 

The second result has also been shown independently by Sela [23] . 

It is not difficult to see (by encoding linear Diophantine systems over the 
naturals) that WordEquations is NP-hard, but the first estimations of Makanin’s 
algorithm was something like 



Over the years Makanin’s algorithm was modified to bring the complexity down 
to EXPSPACE |H], see also the survey in [3. For equations in free groups the com¬ 
plexity seemed to be much worse. Koscielski and Pacholski published a result 
that the scheme of Makanin’s algorithm for free groups is not primitive recursive 
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M- However, a few years later Plandowski and Rytter showed in m that so¬ 
lutions of word equations can be compressed by Lempel-Ziv encodings (actually 
by straight-line programs); and the conjecture was born that WordEquations 
is in NP; and, moreover, the same should be true for word equations over free 
groups. The conjecture has not yet been proved, but in 1999 Plandowski showed 
that WordEquations is in PS PACE [T51ITU] . The same is true for equations in free 
groups and allowing rational constraints we obtain a PSPACE-complete problem 

US El- 

In 2013 Jez applied recompression to WordEquations and simplified all (!) 
known proofs for decidability m- Actually, using his method he could describe 
all solutions of a word equation by a finite graph where the labels are of two 
types. Either the label is a compression c ^ ah where a, b, c where letters or 
the label is a linear Diophantine system. His method copes with free groups and 
with rational constraints: this was done in [7]. 

Moreover, the method of Jez led Ciobanu, Elder, and the present author to 
an even simpler description for the set of all solutions: it is an EDTOL language 
[3]. Such a simple structural description of solution sets was known before only 
for quadratic word equations by [S]. 

The notion of an EDTOL system refers to Eixtended, Deterministic, Table, 
0 interaction, and Tindenmayer. There is a vast literature on Lindenmayer sys¬ 
tems, see [33], but actually we need very little from the “Book of L”. 


Rational sets of endomorphisms 

The starting point is a word equation {U, V) of length n over a set of constants 
A and set of variables Xi,... ,Xk (without restriction, \A\ + k < n). There is 
an nondeterministic algorithm which takes {U, V) as input and which works in 
space NSPACE(nlogn). The output is an extended alphabet C A A oi linear 
size in n and a finite trim nondeterministic automaton A where the arc labels 
are endomorphisms over C*. The automaton A accepts therefore a rational set 
TZ = L{A) C End(C'*), and enjoys various properties which are explained next. 
The arc labels are restricted. An endomorphism used for an arc label is defined 
by mapping c u where c G C is a letter and u is some word of length at most 
2. The monoid End(C'*) is neither free nor finitely generated, but TZ lives inside 
a finitely generated submonoid H* C End(C'*) where H is finite. Thus, we can 
think of 7?. as a rational (or regular) expression over a finite set of endomorphisms 
H as we are used to in standard formal language theory. For technical reasons it 
is convenient to assume that C contains a special symbol ^ whose main purpose 
is serve as a marker. The algorithm is designed in such a way that it yields an 
automaton A accepting a rational set TZ such that 

{h(#) I h€TZ}CA*#---#A*. 

k—1 symbols ^ 

Thus, applying the set of endomorphisms to the special symbol ^ we obtain 
a formal language in {A* A*. The set {h(#) | h £ TZ} encodes a set of 
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fc-tuples over A*. Due to Asfeld [T] we can take a description like {/i(#) | h £ TZ} 
as the very definition for EDTOL. Now, the result by Ciobanu et al. in is the 
following equality: 

{h(#) I h e 7^} = {a(Xi)# • ■ • #a(Xfc) I aiU) = a{V)} . 

Here, tr runs over all solutions of the equation {U,V). Hence, the set of all 
solutions for a given word equation is an EDTOL language. 

The results stated in are more general^ They cope with the existential 
theory of equations with rational constraints in finitely generated free products 
of free groups, finite groups, free monoids, and free monoids with involution. For 
example, they cover the existential theory of equations with rational constraints 
in the modular group PSL(2,Z). 

The NSPACE(?T,logn) algorithm produces some A whether or not {U, V) has 
a solution. (If there is no solution then the trimmed automaton A has no states 
accepting the empty set.) This shifts the viewpoint on how to solve equations. 
The idea is that A answers basic questions about the solution set of {U,V). 
Indeed, the construction in [3] is such that the following assertions hold. 

— The equation {U, V) is solvable if and only if L{A) 0. 

— The equation (U,V) has infinitely many solutions if and only if L{A) is 
infinite. 

In particular, decision problems like “Is {U, V) satisfiable?” or “Does (H, V) 
have infinitely many solutions” can be answered in NSPACE(nlogn) for finitely 
generated free products over free groups, finite groups, free monoids, and free 
monoids with involution. Actually, we conjecture that NSPACE(nlogn) is the 
best complexity bound for WordEquations with respect to space. This conjecture 
might hold even if the problem WordEquations was in NP. 


How to solve a linear Diophantine system 

Many of the aspects of our method of solving word equations are present in the 
special case of solving a system of word equations over a unary alphabet. In 
this particular case Jez’s recompression is closely related to [5]. There are many 
other places where the following is explained, so in some sense we can view the 
rest of this section as folklore. 

Assume that Alice wants to explain to somebody, say Bob, in a very short 
time, say 15 minutes, that the set of solvable linear Diophantine systems over 
integers is decidable. Assume that this fundamental insight is entirely new to 
Bob. Alice might start to explain something with Cramer’s rule, determinants 
or Gaussian elimination, but Bob does not know any of these terms, so better 
not to start with a course on linear algebra within a time slot of 15 minutes. 

^ Full proofs are in [4]. 
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What Bob knows are basic matrix operations and the notion of a linear 
Diophantine system: 

AX = c, where A € , X = {Xi,..., and c € 

Here, the Xi are variables over natural numbers. (This is not essential, and actu¬ 
ally makes the problem more difficult than looking for a solution over integers.) 

The complexity of the problem depends on the or values n, ||c||j^ = \ci\ 
and ||A||^ = j \o-ij\- Without restriction (by adding dummies) we have 

l|c|li<Plli. (1) 

Alice explains the compression algorithm with respect to a given solution 
X G N". Of course, the algorithm does not know the solution, so the algorithm 
uses nondeterministic guesses. This is allowed provided two properties are sat¬ 
isfied: soundness and completeness. Soundness means that a guess can never 
transform a unsolvable system into a solvable one. Completeness means that for 
every solution x, there is some choice of correct guesses such that the procedure 
terminates with a system which has a trivial solution. 

So we begin by guessing a solution x G N". First, we can check whether x = 0 
is a solution by looking at c. Indeed, x = 0 is a solution if and only if c = 0. 

Hence, let us assume x ^ 0 (this might be possible even if c = 0.) We define 
a vector b = c. The vector b (and the solution x) will be modified during the 
procedure. Perform the following while-loop. 

while x^O 

1. For all i define x' = Xi — 1 if Xi is odd and x' = Xi otherwise. Thus, all x' 
are even. Rewrite the system with a new vector b' such that Ax' = h'. Note 
that 

ll&'lli<ll&lli + Plli- (2) 

2. Now, all must be even. Otherwise we made a mistake and x was not a 
solution. 

3. Define b'l = &'/2 and x" = x'/2. We obtain a new system AX = b" with 
solution Ax" = b". 

4. Rename h" and x" as b and x. 

end while. 

The clue is that, since ||6||^ < ||A||^ by Equation ([T|), we obtain by Equa¬ 
tion m and the third step an invariant: 

\\b"\\, = \\b'\\j2<\\b\\j2+\\A\\,/2<\\A\\,. 

The procedure is obviously sound. It is complete because in each round ||x||j^ 
decreases and therefore termination is guaranteed for every solution as long as 
we make correct guesses. The final observation is that the procedure defines a 
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finite graph. The vertices are the vectors 6 € Z" with ||&||j^ < There are 

at most such vectors. We are done! It is reported that the explanation 

of Alice took less than 15 minutes. It is not reported whether Bob understood. 

Alice explanation has a bonus: there is more information. We can label the 
arcs according to our guesses with affine mappings of two types: either x i—>■ x+li 
or a; I—>■ 2x. Here Ir denotes the characteristic vector over a non-empty set 
/C {l,...,n}. 

Thus, we have a finite graph of at most exponential size where the arc labels 
are affine mappings of type x ^ \x + Ij with A G {1,2} and I C {1,..., n}. 
Letting 5 = 0 be the initial state and the initial vector c the final state, we have 
a nondeterministic finite automaton which accepts a rational set TZ of affine 
mappings from N" to itself. By construction, we obtain 

{x G N” I Ax = c} = {h{Q) \ hell} . 
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